Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dear CoLoRd developers team!
Hope you are doing well.
Issue:
While working with a CoLoRd in reference mode we have noticed strange behaviour in the case of length of reads << length of reference. Our investigations led to huge nodes in the similarity graph which are way more frequent than expected.
The only reason it usually works is this line:
colord/src/colord/reads_sim_graph.cpp
Line 390 in 25b2860
But this condition is supposed to be true in the case of proper filtering.
We've noticed this in reference-based mode because that's no such condition to add a node to a graph from pseudo-reads.
Proposed Fix:
The fix is just switching to use the right flag of kmc tool,
-cx
instead of-cs
. Here is a quotation from kmc help:It is also supposed to fix the logic of compression since kmers are supposed to be chosen based on count as well as hash.
Testing:
We have performed thorough testing, including specific scenarios with short reads and long references. Feel free to test it yourself.
Acknowledgments:
A special thanks to @iam28th for their assistance in tracing back to the kmc flag.
Hope this improvement gonna be helpful and improve results.
Best regards,
Alexey